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Effect of Smoking on the Body Mass Index-Mortality Relation: Empirical 
Evidence from 15 Studies 


The BM! in Diverse Populations Collaborative Group* 


The authors examined the impact of smoKing status on the relation between body mass index (weight (kg)/ 
height (m)^) and mortality across a group of 15 diverse observational studies. The studies included a 
heterogeneous sample of national samples, cohort studies with mortality follow-up, and clinical trials. 
Consideration of the data according to natural strata resulted in the formation of 42 analytic cohorts.The authors 
examined survival through the end of follow-up for each study, as influenced by body mass index, age, and 
current smoking status at baseline, using a proportional hazards model to describe the relation between body 
mass index and mortality with control for age and smoking status. In this paper, the authors demonstrate that 
the estimated body mass index of minimum mortality changes when data are analyzed white ignoring smoking 
status; but they also demonstrate through a simulation study that eliminating smokers from the data sets prior 
to analysis produces results similar to those expected from the elimination of numerically similar random 
proportions of the data sets prior to analysis. Based on the results of these analyses, the authors find no support 
for the commonly held practice of eliminating smokers from a data set prior to examining the body mass index- 
mortality relation. Am J Epidemiol 1999; 150; 1297—308. 
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Many researchers have examined the consequences 
of obesity in order to ascertain the health effects of 
excess weight. One important component of this 
research has been the determination of the range of 
body composition associated with minimal morbidity 
and mortality. Longitudinal studies provide evidence 
concerning the parameters associated with minimal 
risk, but while findings often appear to reinforce one 
another, interpretations vary. Differences in study 
designs and methods of analysis may explain the 
divergent outcomes. 

For more than half a century, the insurance industry 
has shown weight (adjusted for height) to be a predictor 
of mortality, and beginning in 1955 the Metro¬ 
politan Life Insurance Company established tables of 
desirable weights by height for general use. Epi¬ 
demiologists now prefer to use a single measure called 
the body mass index (BMI). defined as weight (in kilo- 
' grams) divided by height (in meters) squared, to deter¬ 
mine obesity. The latest federal guidelines are based on 
■ this calculation, (The most recent recommendations 
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state that individuals with a BMI greater than 25 are at 
risk of developing diseases associated w'ith overweight 
and obesity (1).) 

Epidemiologists have examined various health risk 
factors to determine whether they affect the relation 
between BMI and morbidity and mortality. One such 
risk factor discussed extensively is cigarette smoking; 
but while all investigators acknowledge the need to 
determine the impact of smoking on this relation, the 
appropriate methods for achieving this aim remain 
subject to debate. The question is whether the shape of 
the curve describing mortality in terms of BMI (i.e., 
the parameters of that curve) differs for smokers and 
nonsmokers. In technical terms, this problem is formu¬ 
lated by asking whether there is an interaction between 
cigarette smoking and weight or whether cigarette 
smoking is a confounder in this relation. Reports on 
this subject have been inconsistent. 

We conducted an analysis of person-level data from 
15 studies (2-19) to examine the effect of .smoking on 
the relation between BMI and mortality by addressing 
the following questions: Is the shape of the curve for 
mortality by BMI different for smokers and nonsmok¬ 
ers? Does failure to consider cigarette smoking as a 
confounder alter the point of minimum mortality? 
Does restricting an analysis to nonsmokers produce 
results that differ from those obtained w'hen one ran¬ 
domly reduces the size of the analytic sample? 
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MATERIALS AND METHODS 
Studies 

To obtain a heterogeneous sample, we included in our 
analyses national samples with mortality follow-up, 
cohort studies with mortality follow-up, and clinical tri¬ 
als. The national samples included the pooled National 
Health Interv'iew Survey data for 1987-1990 (2) and 
data from the First (3) and Second (4) National Health 
and Nutrition Examination Surveys. Cohort studies 
included the Framingham Heart Study (5), the Puerto 
Rico Heart Health Program (6), the Tecumseh 
Community Health Study (7), the Yugoslavia 
Cardiovascular Disease Study (8), the Honolulu Heart 
Program (9), the Scottish Collaborative Study (10), the 
Renfrew and Paisley Survey (11), the Israeli Ischemic 
Heart Disease Study (12), the Glostrup Cohort (13-16), 
and the Lipid Research Clinics Program Prevalence 
Study (17). Clinical trials included the Hypertension 
Detection and Follow-up Program (18) and the Multiple 
Risk Factor Inter\'ention Trial (19). Baseline informa¬ 
tion began in 1948 with the Framingham Heart Study 
and continued through 1990 with the National Health 
Interview Survey. Data samples included people of both 
sexes and White and Black subgroups. Follow-up varied 
from 6 years for the National Health Interview Survey 
to 30 years for the Framingham cohort (table 1). Our 
final analytic sample was taken from 15 studies, pro¬ 
ducing 250,182 participants and 38,532 deaths (table 1). 

The studies analyzed often contained subgroups 
based on factors such as sex. race/ethnicity, area of res¬ 
idence, and treatment status (in the case of clinical tri¬ 
als). These subgroups were analyzed separately if they 


met the criterion of having at least 15 deaths for each 
parameter included in our most complex model. The 
Lipid Research Clinics study and the Multiple Risk 
Factor Intervention Trial contained too few African 
Americans to permit separate analyses. In all. 42 
cohorts were analyzed (table 2). 

We included in our analyses participants with known 
values for age, smoking status, and BMI, which neces¬ 
sitated Only a small number of exclusions. The studies 
exhibited a broad distribution span for BMI levels, 
ranging from an average BMI of 22.2 for the rural 
Yugoslavian male cohon to 30.2 for African-American 
women from the referred-care group of the Hyper¬ 
tension Detection and Follow-up Program. Although 
average age did not differ greatly among the cohorts, 
the age ranges showed substantial variation (tables 1 
and 2). When we classified only current smokers as 
smokers, we found their proportion to vary from 
slightly more than 25 percent among white females in 
the National Health Interview Survey to over 71 per¬ 
cent among rural Yugoslavian men (table 2). Since we 
did not have information on number of cigarettes 
smoked per day for all cohorts, we simply separated 
participants into smokers and nonsmokers. 

Detailed descriptions of the studies included in our 
analysis are provided in the Appendix. 

Statistical methods 

To model the relation between BMI and mortality, 
we used a single algorithmic approach to subject indi¬ 
vidual cohons to a uniform method of analysis. A com¬ 
plete report on the development of our methods has 
been published elsewhere (20). 


TABLE 1, Studies included in a person-level meta-analysis of body mass index and mortality 



Baseline 

yearCs) 

No, 

of 

observations 

No. 

of 

deaihs 

Nc.of 
years of 

fOllOW'Up 

Age 

range 

(years) 

National Health Interview Survey 

1987-1990 

121,208 

9,577 

6 

18-90 

NHANES* 1 

1971-1975 

12,730 

4.016 

19 

24-77 

NHANES II 

1976-1980 

9,064 

2.106 

14 

30-75 

Framingham Heart Study 

1948-1951 

5,163 

1,964 

30 

28-62 

Puerto Rico Heart Health Program 

1965 

9,776 

1,726 

15 

35-79 

Tecumseh Community Health Study 

1959-1960 

4,580 

959 

18 

18-91 

Yugoslavia Cardiovascular Disease Study 

1964 

6,450 

1,337 

16 

34-62 

Honolulu Heart Program 

1965 

8,005 

2.466 

21 

45-68 

Scottish Collaborative Study 

1970-1973 

7,008 

2,079 

22 

21-75 

Renfrew and Paisley Survey 

1972-1976 

15,394 

4.443 

17 

45-64 

Israeli Ischemrc Heart Disease Study 

1963 

10.027 

3,464 

22 

39-74 

Glostrup Cohort 

1977-1992 

10.135 

1.100 

9 

30-80 

Lipid Research Clinics Program 

1971-1976 

8.175 

947 

12 

30-97 

Hypertension Detection and FolloW-up Program 

1973-1974 

10,908 

1.415 

8 

30-69 

Multiple Risk Factor Intervention Trial 

1973-1975 

11,559 

933 

10 

35-58 

Total 


250,182 

38.532 




' NHANES, National Health and Nutrition Examination Survey. 
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With few exceptions, investigators report the relation 
between BMI and mortality to be nonmonotonic, in that 
excess mortality is associated with both high and low 
BMIs. To model such a relation and to account for 
asymmetry, we introduced a transformation of BMI 
into “lean” BMI (1/BMl) as the independent variable in 
a proportional hazards model. Previous analyses had 
suggested that the relation between BMI and mortality 
was asymmetric ( 20 ), so we sought a transformation to 
normality to account for this. In seeking a transforma¬ 
tion to normality, we were influenced by Cornfield et 
al. ( 21 ), who demonstrated that if one assumes normal¬ 
ity for both deaths and nondeaths and if, in addition, the 
variance differs for the two distributions, a quadratic 
model is necessary to describe the relation adequately 
in a logistic regression. This transformation was sug¬ 
gested by Nevill and Holder (22), who termed 1/BMl 
the “lean” BMI (LBMI). They demonstrated that LB MI 
was normally distributed using data from the Allied 
Dunbar National Fitness Study and that it was more 
closely related to percentage of body fat than BMI, 

We used a proportional hazards model to describe 
the relation between LBMI and mortality, fitting each 
cohort to determine which model best described the 

relation. We assumed the following hazard: 

X{t) = Xo(f) X exp{Pi(age) -I- (^.(smoking) + 
p 3 (LBMI) -b P 4 (LBMI-) + ps(smoking X LBMI) -b 
|3i5(sraoldng X LBMI")}, 

where smoking is an indicator variable, coded 1 for 
smokers and 0 for nonsmokers, and the (J's comprise a 
vector of unknown parameters to be estimated using 
maximum likelihood (given a cohort with information 
on the characteristics of participants and deaths during 
follow-up). For our purposes, we assumed (3i and P; to 
be nonzero and developed an algorithm to determine 
which of the remaining parameters were necessary to 
best fit the data for each cohort. 

We used a hierarchical modeling algorithm to 
decide which of four models was best supported by 
the data in each cohort. The four models considered 
were as follows. 

Model 1. Pj = p 4 = Ps = Ps = 0. 

This model assumes that only age and smoking are 
significant predictors of mortality. 

Model 2. pj ^ 0; p 4 “ Ps = p6 = 0. 

This model assumes that LBMI is related to mortality 
in a monotonic fashion. The direction of the relation 
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depends on the sign of the P 3 . 

Model 3. p 3 ^ 0; P 4 ^ 0; P 5 = Ps = O' 

This model assumes a quadratic relation between 
LBMI and mortality. 

Model 4. p 3 0: P 4 ^ 0; P 5 # 0; P^ ^ 0. 

The final model assumes chat the basic relation 
between LBMI and mortality differs for smokers and 
nonsmokers. 

We began by assuming that model 1 was the correct 
model and then examining whether each successively 
more complex mode! added significant information. 
The decision that a more complex model added signif¬ 
icant information to a less complex model was made 
on the basis of a generalized likelihood ratio statistic. 
If model 1 was found to fit best, we concluded that the 
cohort showed no relation between BMI and mortality. 
If model 2 was selected, we determined that there was 
either a direct relation or an inverse relation, depend¬ 
ing on the sign of P,. If the third model fitted best, we 
considered the relation to be quadratic; and for model 
4, we classified it as exhibiting interaction. 

For those cases in which the quadratic model was 
deemed most appropriate, the estimated BMI of mini¬ 
mum mortality and its standard error were 

calculated using standard statistical procedures ( 20 ): 


BMI„,„ = - 


and the variance of was estimated using the 

delta method: 

/pArVar(p3 P 4 ) 

Var(BMI^J = df ^ “ 


Cov{p,, P4) 


Var(P4) 

Pr 


We reanalyzed the relation between BMI and mortal¬ 
ity for each cohort after eliminating smokers from the 
data set. To determine whether this exclusion would 
produce significant changes in the relation, we per¬ 
formed an approximate randomization test (23). This 
test randomly assigned smoking status to individuals 
from each cohort. The number of random participants 
designated smokers equaled the number of actual 
smokers within each cohort. These designated smokers 
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TABLE 2. Analytic cohorts and results used in a meta-analysis relating body mass Index to mortality 


No. 

of 

observations 

No. 

of 

deaths 

Body mass Index* 

Age (years) 

% 

smokdrs 

Best 

model 

{relation) 


Standard 

error 

Subjects 

Mean 

SDf 

Mean 

SD 





National Health Interview Survey 




59,036 

4,228 

24.3 

5.0 

46.7 

18.9 

25.4 

Quadratic 

23.8 

0.35 

White females 

44,866 

3,829 

25.6 

3.9 

44.3 

17.4 

29.5 

Interaction 



White males 

11,171 

779 

26.9 

6.0 

43.4 

17.8 

28.2 

Quadratic 

26.3 

0.72 

Black females 

6,135 

741 

26,0 

4.6 

44.4 

17.4 

37,1 

Quadratic 

26.9 

0.96 

Black males 




Rrst National Health and Nutrition Examination Survey 



6,466 

1,538 

25.3 

5.4 

47.7 

15.4 

30.6 

Quadratic 

23.4 

0.45 

White females 

4,430 

1,774 

25,8 

4,0 

51,1 

15.1 

41.4 

Quadratic 

23.5 

0.41 

White males 

1.184 

373 

27.9 

B.7 

^7.7 

15.2 

36.4 

Quadratic 

26.7 

1.08 

Black females 

650 

331 

25.7 

5.0 

53,0 

14.9 

47,5 

Quadratic 

25.5 

0.61 

Black rnales 




Second National Health and Nutrition Examination Survey 



4,267 

775 

25.9 

5.S 

54.5 

13.4 

28.4 

Quadratic 

25.2 

0.83 

White females 

3.804 

1,102 

25.9 

3.9 

54.4 

13.2 

35,8 

None 



White males 

S47 

97 

28.7 

6.4 

53.6 

13.2 

23.2 

Quadratic 

26.3 

1.56 

Black females 

446 

132 

2S.B 

4.7 

54,2 

13.7 

47.1 

None 



Black males 






Framing^ 

lam Heart Study 



- 

2,350 

886 

25.3 

4.7 

44.1 

8.5 

41,1 

Ouad ratio 

23.5 

0.59 

White females 

2,313 

1,078 

25.7 

3.5 

44.1 

8.6 

64.6 

Quadratic 

23.0 

0.53 

White males 





Puerto Rico Heart Health Program 




6,806 

1,234 

26,0 

4.1 

54,2 

6.4 

41.2 

Quadratic 

23.6 

0.87 

Urban males 

2,970 

492 

23.3 

3,5 

55.0 

7.0 

49,6 

Quadratic 

22.9 

0.60 

Rural males 





Tecumseh Community Health Study 




2,389 

388 

25.2 

5.2 

41.0 

15.7 

35.1 

Quadratic 

22.1 

1.21 

White females 

2,191 

571 

25.5 

3.8 

41.5 

14.9 

60.2 

Quadratic 

23.7 

0.73 

White males 





Yugoslavia Cardiovascular Disease Study 




3,548 

683 

24.3 

3.6 

45.2 

7.5 

68.5 

Quadratic 

24.3 

0,61 

Urban males 

2,902 

654 

22.2 

2.5 

46,7 

7.3 

71.6 

Quadratic 

24.6 

1.09 

Rural males 






Honolulu Heart Program 




8,006 

2,466 

23.8 

3.2 

54.4 

5.6 

43.7 

Quadratic 

20.9 

0.36 

Males 






Scottish Collaborative Study 




6,006 

1,904 

25.1 

3.1 

47.6 

7.3 

55.1 

Quadratic 

22.5 

0.52 

Males 

1,002 

175 

24.7 

3.8 

47.1 

6.7 

58.8 

None 



Females 






Renfrew and Paisley Survey 




7,055 

2.549 

25.9 

3.4 

54.1 

5.6 

56.6 

Quadratic 

24,1 

0.71 

Males 

8,339 

1.894 

25,8 

4,5 

54.4 

5.6 

46.7 

Interaction 



Females 


Table continues 


were then excluded from the analysis. By repeating this 
process 1,000 times, we were able to judge whether the 
elimination of smokers produced results that differed 
significantly from those due to random exclusions. 

RESULTS 

Table 2 shows the results obtained from modeling 
the relation between BMI, age, smoking, and mortal¬ 


ity. For 30 of the 42 analytic cohorts, we found the 
quadratic relation between BMI and mortality to be the 
most appropriate, while controlling for smoking. For 
eight of the remaining cohorts, we found no relation 
between BMI and mortality. One remaining cohort 
showed a direct relation between BMI and mortality, 
and the other three showed interactive relations in 
which the relation between BMI and mortality differed 
for smokers and nonsmokers. 
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TABLE 2. Continued 


No. 

of 

observations 

No. 

of 

deaths 

Body mass index*^ 

Age (years) 

% 

smokers 

Best 

model 

(relation) 

SMLt 

Standard 

error 

Subjects 

Mean 

SDt 

Mean 

SO 





Israeli Ischemic Heart Disease Study 




10,027 

3,464 

25.7 

3,3 

49.4 

6.9 

51.9 

Interaction 



Males 






Glastrup Cohort 




5,059 

437 

24.2 

4.3 

48.9 

13.7 

46.8 

Quadratic 

24,8 

0.83 

Females 

5,076 

663 

25.4 

3,5 

47.6 

7.3 

56.8 

Quadratic 

24.a 

0.86 

Males 





Lip/d Research Clinics Program, random sample 



2,198 

191 

24.5 

4,5 

48.2 

12.4 

29.8 

None 



Females 

2,436 

286 

26.3 

3.4 

46.9 

11.7 

35.3 

Quadratic 

25.2 

0.88 

Males 




Lipid Flesearch Clinics Program, hyperlipidemia sample 



1,532 

185 

25.8 

5.0 

49.7 

13.0 

36.3 

None 



Females 

2,009 

285 

27.3 

3.5 

46.2 

11.4 

39.0 

Quadratic 

26.2 

0.79 

Males 




Hypertension Detection and Follow-up Program, stepped care 



1,178 

103 

28.2 

6.3 

52.7 

9.2 

29.8 

Direct 



Whits females 

1,893 

223 

28.1 

4.3 

50.7 

9.6 

35.1 

Quadratic 

29.4 

2.53 

White males 

1,339 

133 

29.5 

7.0 

49.6 

10.2 

37.8 

None 



Black females 

1,064 

196 

27.1 

5.2 

50.2 

10.3 

55.5 

Quadratic 

29.1 

2.08 

Black males 




Hypertension Detection and Follow-up Program, referred care 



1,145 

118 

28.2 

6.4 

52.3 

9.3 

31.2 

None 



White females 

1,857 

235 

23.2 

4.3 

50,8 

9.6 

35,2 

Quadratic 

26.7 

1.34 

White males 

1,352 

167 

30.2 

7.0 

49.2 

10.1 

36.5 

Quadratic 

33.1 

3.36 

Black females 

1,080 

240 

27.1 

4.9 

50.9 

10.0 

56.5 

Quadratic 

30.5 

2.94 

Black males 





Muftipfe Risk Ractor fnterver^t/on Tfis/ 




5,759 

442 

27.7 

3.4 

46.5 

6.0 

59.3 

None 



Special Intervention 











group 

5,800 

491 

27.7 

3.S 

46.4 

6.0 

58,9 

Quadratic 

26.4 

1.64 

Usual Care group 


• Weight (kg)/hBight {rn)=. 

t SD, standard deviation; BMi^, body mass index of minimum mortaiity, 


It is widely accepted that controlling for smoking 
when examining the BMl-mortality relation is neces¬ 
sary to achieve meaningful results in epidemiologic 
Studies (24), Controlling for smoking makes sense on 
a priori grounds. However, to our knowledge, no one 
has systematically examined whether control for smok¬ 
ing actually makes a difference when relating BMI to 
mortality. We examined whether controlling for smok¬ 
ing does, in fact, change results when analyzing the 
relation between BMI and mortality. To conduct this 
investigation, we reconstructed the analyses for our 42 
cohorts while ignoring smoking status. We found that 
this omission leaves the qualitative results unchanged 
(table 3). The estimates of do change, however 

(figure 1): For the 30 cohorts fitting the quadratic 
model, the random effects (25) average BMImm is 24.6 
(standard error (SE) 0.33) when smoking is included, 
and it increases to 25.3 (SE 0,33) when smoking is 
excluded. This examination does not address the ques- 
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tion of whether the quantitative shape of the curve is 
altered when smoking is included in the model; how¬ 
ever. on the basis of these analyses, we conclude that 
controlling for smoking status changes the results and 
therefore data should be controlled for smoking when 
the BMI-mortality relation is being analyzed. 


TABLE 3. Proportional hazards models selected using 
different models and different subsets of 42 analytic cohorts 


Roiatton 

S0lec!ed 


Model* 


1 

2 

3 

4 

None 

8 

11 

15 

6 

Linear, inverse 

0 

3 

0 

1 

Linear, direct 

1 

1 

2 

1 

Quadratic 

30 

27 

25 

34 

Interaction 

3 

0 

0 

0 


* Model 1; all observations, with smoking included in the model; 
mode! 2; smokers only; model 3: nonsmokers only; model 4: all 
observations, with smoking not included in the model. 


Source: https://www.industrydocuments.ucsf.edu/docs/zqjk0001 
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FIGURE 1. Comparison of body mass indices {weight (kg)/height (m)^) of minimum mortality estimated with and without control for 

smoking. The average in 30 cohorts with adjustment for smoking was 24.6 (standard error 0,33): when no adjustment was used, the aver¬ 
age was 25.3 {standard error 0.33). The solid line represents equality. 


Table 3 shows the results of reanalyzing the data 
from each cohort using four different models. The first 
column of data in the table shows the results of analy¬ 
ses in which the best-supported models were selected 
for each of the 42 cohorts examined, without deletions. 
Smoking was included and interaction was tested. The 
second column presents the analyses of smokers only, 
and the third the analyses of nonsmokers only. (For the 
latter two cases, an interaction model was not defined, 
since it was not possible to include smoking in the 
model.) The final column presents the results from 
analysis of the entire cohort, ignoring smoking. We 
obtained similar results regardless of whether we ana¬ 
lyzed nonsmokers or smokers only, and in both cases, 
few changes were noted. 

Figure 2 presents results for the three interaction 
models examined in these analyses. In each case, a qua¬ 
dratic relation is apparent for both smokers and non- 
smokers (see also table 3). In each of the three cases, 
different quadratic models for the BMI-mortality rela¬ 
tion were indicated for smokers and nonsmokers. For 
National Health Interview Survey white males and 
Israeli males, the basic shape of the model differed for 
smokers and nonsmokers but the were similar. 

In National Health Interview Survey white males, the 
was 26.4 (SE 0.23) for smokers and 26.2 (SE 
0.49) for nonsmokers. Among Israeli males, the BMIn^ 
was 22.4 (SE 1.08) for smokers and 22.4 (SE 0.68) for 
nonsmokers. For Renfrew and Paisley women, the 


BMI^^ was 27.1 (SE 2.20) for smokers and 23.4 (SE 
0.76) for nonsmokers. 

In table 4, we present the results of a single simula¬ 
tion experiment to illustrate the difficulties involved in 
excluding observations from cohort analyses. We elim¬ 
inated set percentages of randomly selected data from 
each cohort prior to employing the algorithm (i.e., fit¬ 
ting four propordonal hazards regression models and 
determining the one that best fits the data using likeli¬ 
hood ratio statistics). As the percentages of the data 
exclusions increased from 10 percent to 80 percent, we 
found an increasing number of changes in comparison 
with the inclusion of ali data in the model. Since these 
deletions were random, the changes achieved could 
have no biologic significance. If the elimination of 
smokers is a unique requirement for analysis, we 
should obtain results discemibly different from those 
produced by randomly deleting a proportion of the 
sample equal to the number of smokers prior to analy¬ 
sis. If such discernible differences do not occur—i.e„ if 
we obtain the same results when randomly deleting a 
proponion of the sample equal to the number of smok¬ 
ers within the sample—then there is nothing significant 
about the nature of the data excluded. 

W'e conducted 1,000 repetitions of the simulation 
experiment for all 42 cohorts. For each repetition, we 
eliminated a percentage of randomly selected data 
prior to analysis and then selected the most appropri¬ 
ate model using our standard algorithm. We repeated 
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FIGURE 2. Log relative risk (RR) of mortality by body mass index 
(BMl) (weight (kgj/height (m)’) in models estimating the interaction 
between smoking and the BMt-mortality relation in three cohorts. 
Solid line {—), nonsmokers; dashed line (- - ■), smokers. NHIS, 
National Health Interview Survey. 


this process 1,000 times and tabulated the number of 

cases in which the model selected for the reduced data 
set differed from that selected when the complete data 
set was analyzed. 

Am J Epidemiol Vol. 150, No. 12. 1999 


TABLE 4, Results (number of cohorts) of randomly deleting 
a Specified percentage of observations prior to analysis (rt = 
42 cohorts) 


% 

deleted 



Relation selected 


Non© 

invars 9 

Direct 

Quadratic 

Interaction 

0 

8 

0 

1 

30 

3 

10 

9 

0 

1 

30 

2 

20 

11 

0 

1 

28 

2 

30 

11 

0 

1 

27 

3 

40 

14 

1 

1 

24 

2 

50 

14 

2 

1 

21 

4 

60 

13 

2 

1 

24 

2 

70 

19 

0 

2 

19 

2 

00 

21 

1 

3 

16 

1 


Table 5 compares the outcomes from the randomiza¬ 
tion experiment with tho.se From the analysis of the 
BMI-mortality relation involving the complete data 
sets. The top half of the table compares results 
obtained using the entire cohort with results obtained 
after excluding smokers from the analysis. The bottom 
half of the table presents the results of the randomiza¬ 
tion experiment. For ease of comparison, the numbers 
in the lower half of the table have been divided by 
1,000. For example, among the 8.000 randomized sim¬ 
ulations conducted on the eight cohorts that indicated 
no relation when all of the data were included in the 
analysis, 6,134 agreed with this original finding, 1,027 
resulted in inverse relations. 42 resulted in direct rela¬ 
tions, and 797 resulted in quadratic relations. 

DISCUSSION 

Although health professionals acknowledge the 
adverse effects of obesity on health, there is still much 
discussion concerning optimal body weight, defined as 
the weight an individual should maintain to maximize 
wellness and life expectancy. Most observational stud¬ 
ies show a high mortal ity rate for both the leane st and 
the heaviest gro ups: i.e.. there exists a nonmonot onic 
relation~be t^^n BMI and morta lity. Explanations for 
the Finding of a high mortality rate among the lean 
vary (26). but some investigators attribute it partly to 
smoking. Since smokers tend to be leaner than non- 
smokers and have a higher mortality rate, it is sug¬ 
gested that the nonmonotonic relation between BMI 
and mortaiity might be eliminated if data were con¬ 
trolled for smoking. To our knowledge, this widely 
held belief has not been tested previously. 

We examined the effect of smoking on the BMI- 
mortality relation using our collection of heterogene¬ 
ous studies, which contained national samples, cohort 
studies, and clinical trials. We cannot claim that our 
data set represents a complete—or even a random— 
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TABLE 5. Results (number of cohorts) obtained after deleting smokers (top) and results obtained from 
1,Q00 simulations after randomly deleting the same number of observations as there were smokers in 
the cohort (bottom) 


Results 


Relation selected 


with iuli 
cotion 

Nona 

Inverse 

Direct 

Quadratic 

Total 



Results using only nonsmokers 



None 

7 

0 

0 

1 

8 

Direct 

0 

0 

1 

0 

1 

Quadratic 

8 

0 

1 

21 

30 

Interaction 

0 

0 

0 

3 

3 

Total 

15 

0 

2 

25 

42 



Results (divided by 1,000} after random deletions 


None 

6,134 

1.027 

0.042 

0,797 

8 

Direct 

0.676 

0.000 

0.295 

0.029 

1 

Quadratic 

4.275 

2.296 

0.121 

23.308 

30 

Interaction 

0.010 

0.000 

0.004 

2.986 

3 

Total 

11,095 

3.323 

0.462 

27,120 

42 

of all of the 

available studies, though all 

smokers in these cohorts. 

There appears to 


major study designs were included within the sample. 
Some investigators declined our invitation to partici¬ 
pate in the collaboration, and in one case the investi¬ 
gators withdrew- their study after the analyses were 
conducted. However, we have no reason to assume that 
our results would change with the inclusion of addi¬ 
tional studies in our comprehensive data set. We also 
believe this to be the only analysis to date to have sub¬ 
jected multiple studies to identical analytic procedures. 

We conducted several e.xaminaiions of sensitivity in 
our results to validate our methodology. First, we 
adopted an algorithm, which we applied mechanically 
to the data from each cohort, to select the most appro¬ 
priate model. We selected as part of this algorithm the 
p value 0.05 and repeated the analyses using p = 0.10. 
The greater value did not alter the outcome substan¬ 
tially, nor did it change considerably when we added 
past smokers, in addition to current smokers, to our 
definition of smoking. Finally, we repeated the analy¬ 
ses using logistic regression rather than the propor¬ 
tional hazards models, with few changes being noted. 

Epidemiologic studies have reported the association 
between BMI and mortality to be positive (27), J-shaped 
(28). inversely J-shaped (29). U-shaped (30). nonexis¬ 
tent. and even inverse (31). We conducted these analyses 
to determine whether we could reproduce those conflict¬ 
ing results. We found, however, that when we subjected 
these studies to a uniform analytic approach, the results 
were surprisingly consistent. A quadratic relation was 
found in 3fl of the analytic cohort s, and m only three was 
a n interaction between sm oking and BMI discerned. 

Tdie questionTs whether the interaction models pro¬ 
vide evidence that the shape of the basic relation 
between BMI and mortality differs for smokers and non- 


held belief that if smokers are eliminated from a data set 
prior to analysis, a direct relation between BMI and mor¬ 
tality will be observed. Because of this view, we have 
provided an in-depth presentation of the interaction 
models. 

Our findings raise the issue of whether the common 
practice of eliminating smokers prior to analysis of the 
BMI-mortality relation is justified. In our view, strong 
arguments can be made against this practice. The num¬ 
ber of smokers within a cohort is often large, and dele¬ 
tions that may constitute more than half of the sample 
result in loss of power to discern true relations. In addi¬ 
tion. with the sample having been diminished to such an 
extent, confusion results concerning the proper popula¬ 
tion to which the analytic results apply. Smokers them-'j 
selves vary from study to study and do not necessarily 
share a set of common characteristics, which makes | 
these eliminations from analysis problematic. 

In our simulation study, we demonstrated that exclud¬ 
ing smokers from a BMl-mortality analysis produces 
findings similar to those resulting from an equal number 
of random exclusions from the data set. On the basis of 
these results, we caution investigators against subsetting 
data without applying formal statistical procedures to 
test whether the omissions are biologically meaningful. 
If the kinds of random eliminations we performed above 
produce outcomes that differ from those resulting from 
the deletion of the specified data, there is evidence that 
such omissions are biologically meaningful and hence 
justified. Otherwise, regardless of widely accepted 
beliefs that certain subsets (such as smokers) will con¬ 
found results, the omissions cannot be justified. 

The practice of e.xcluding smokers rests on the knowl¬ 
edge that smoking adversely affects mortality risk and 
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that smokers are known to be leaner than nonsmokers, 
on average. On the basis of these facts, some researchers 
would reason that smoking may account for the upturn 
in mortality at the lower end of the BMI distribution. We 
sympathize with the initial attractiveness of this reason¬ 
ing. However, when one breaks data into subsets, the 
results will often differ among the parts as compared 
with the whole or even compared with each other. What, 
then, should be a guide for drawing inferences? Should 
w e simply pick t he one we believe or like, par ticularly if 
it matches our preconceptions, o r should there be some 
formal criteria for deciding when the subgroup findings 
are important? We believe that standard criteria should 
be used to make this decision, and here we introduced 
the randomization test as a possible standard against 
which to measure results derived from deletion. If delib¬ 
erate deletions produce results that are clearly dis¬ 
cernible from random deletions, the deliberate deletions 
may be a valid basis for inference; but when random 
deletions produce analytic results similar to those from a 
deletion based on a particular characteristic, we think 
that the inference based on the deletion is questionable. 

In conclusion, our statistical tests showed no evidenc e^"' 
of a universal interaction between smoking and Sl^II 
that affects the BMI-mortality relation. We found that 
while controlling for smoking does change the estimate 
of the BMImin, it does not substantially alter the basic 
shape of the relation. Finally, we demonstrated that the 
results obtained by eliminating smokers from the analy¬ 
ses do not differ significantly from results obtained by 
making random deletions from the analytic data set. The 
belief that smoking is responsible for the quadratic rela¬ 
tion between BMI and mortality or that it explains the 
excess of mortality among the leanest groups is not sup¬ 
ported by empirical observation or quantitative testing. 
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APPENDIX 

Studies Included in the Analysis 
The National Health Interview Survey 

The National Health Interview Survey iNHIS) is a contin¬ 
uing nationwide survey of the LfS civilian noninstitutionalized 
population conducted through households (2}. Data on self- 
reported weight and height are available for ail participants. 
Specific health topics (supplements! are added each year to 
the core questionnaire. Information on smoking is available 
only for the years 1987-1990. Smoking status is classified as 
never, former, or current. The cancer risk factor supplement 
provides the smoking data for 1987; the occupational supple¬ 
ment provides the data for 1988; and the health promotion and 
disease prevention supplement provides the data for 1990. 
Data on smoking for 1989 are provided in the diabetes sup¬ 
plement; smoking status is coded as 1 (vest or 0 (no), but the 


information is available only for half of the adult sample. We 
recoded smoking status as yes/no for all 4 years by classifying 
only those persons currently using cigarettes as smokers. 

For 1986 onward, linkage Information is available on 
NHIS respondents to allow for matching with other data 
systems, including the National Death Index. The ability to 
link NHIS respondents to the National Death Index pro¬ 
vides a longitudinal component of the NHIS which allows 
for ascertainment of vital status. To date, data on multiple 
causes of death are available for the NHIS survey years 
1986-1994, with follow-up through December 31, 1995. 

Our analyses included the 121,208 participants for whom 
information on vital status, age, body mass index, and smok- j 
ing status was available from the 1987-1990 surveys. These 
participants contributed four analytic cohorts when stratified 
by sex and ethnic group (table 2). Infonnation on Hispanic 
ethnicity was available for NHIS participants. However, the 
number of deaths among Hispanic participants was too low 
to be analyzed .separately, so for this analysis race rather 
than ethnicity was used to classify panicipanis. This resulted 
in most Hispanic participants’ being classified as White. 

The NHANES I Epidemiologic Follow-up Study 

The NHANES I Epidemiologic Follow-up Study data pro¬ 
vide follow-up for morbidity and mortality among 14,407 
individuals aged 25-74 year.s initially who received com¬ 
plete medical examinations during the First National Health 
and Nutrition Examination Survey (NHANES I), conducted 
from 1971 to 1975 (3). Foliow-up surx'eys were conducted 
from 1982 to 1984, in 1986 (surveying persons aged >55 
years at baseline), and again in 1987. For our analyses, we 
used vital status ascertained through the 1987 follow-up. 

The NHANES I Epidemiologic Foliow-up Study pro¬ 
vided four cohorts and 12.730 participants for our analyses 
(table 2). 

The Second National Health and Nutrition 
Examination Survey and mortality foilow-up 

The Second National Health and Nutrition Examination 
Survey (NHANES II) was conducted from 1976 through 
1980 in a nationwide probability sample of approximately 
28.000 persons aged 6 months through 74 years from the 
civilian, noninstitutionalized US population. Baseline data 
were similar to those of NHANES I (4). A mortality foliow- 
up was conducted for the 9.252 original NHANES II partic¬ 
ipants aged S30 years in December 1992, with over 2,000 
deaths being identified. 

This study provided four cohorts and 9,064 participants 
for Our analysis (table 2). 

The Framingham Heart Study 

The Framingham Heart Study was begun in 1948 to inves¬ 
tigate factors associated with the development of cardiovas¬ 
cular disease in a representative sample of the adult popula¬ 
tion of Framingham. Massachusetts (5). A random sample of 
households was selected, with a response rate of 69 percent 
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